The Effect of Network Total Order, Broadcast, and Remote-Write Capability on Network-Based Shared Memory Computing
نویسندگان
چکیده
Emerging system-area networks provide a variety of features that can dramatically reduce network communication overhead. In this paper, we evaluate the impact of such features on the implementation of Software Distributed Shared Memory (SDSM), and on the Cashmere system in particular. Cashmere has been implemented on the Compaq Memory Channel network, which supports low-latency messages, protected remote memory writes, inexpensive broadcast, and total ordering of network packets. Our evaluation is based on several Cashmere protocol variants, ranging from a protocol that fully leverages the Memory Channel’s special features to one that uses the network only for fast messaging. We find that the special features improve performance by 18–44% for three of our applications, but less than 12% for our other seven applications. We also find that home node migration, an optimization available only in the message-based protocol, can improve performance by as much as 67%. These results suggest that for systems of modest size, low latency is much more important for SDSM performance than are remote writes, broadcast, or total ordering. At the same time, results on an emulated 32-node system indicate that broadcast based on remote writes of widelyshared data may improve performance by up to 51% for some applications. If hardware broadcast or multicast facilities can be made to scale, they can be beneficial in future system-area networks. This work was supported in part by NSF grants CDA–9401142, EIA– 9972881, CCR–9702466, and CCR–9705594; and an external research grant from Compaq. Leonidas Kontothanassis is now with Akamai Technologies, Inc., 201 Broadway, Cambridge, MA 02139.
منابع مشابه
The Effect of Network Total Order, Broadcast, and Remote-Write Capability on Network-Based Shared Memory Computing1
Emerging system-area networks provide a variety of features that can dramatically reduce network communication overhead. Such features include reduced latency, protected remote memory access, cheap broadcast, and ordering guarantees. In this paper, we evaluate the impact of these features on the implementation of Software Distributed Shared Memory (SDSM), and on the Cashmere system in particula...
متن کاملHiperTM: High Performance, Fault-Tolerant Transactional Memory
We present HiperTM, a high performance active replication protocol for fault-tolerant distributed transactional memory. The active replication paradigm allows transactions to execute locally, costing them only a single network communication step during transaction execution. Shared objects are replicated across all sites, avoiding remote object accesses. Replica consistency is ensured by a) OS-...
متن کاملApplication of self organizing maps for investigating network latency on a broadcast-based distributed shared memory multiprocessor
Broadcast-based DSMmultiprocessors are nowadays an attractive platform for parallel computing due to their advantages in terms of scalability and programmability. In order to obtain high performance out of these systems, network latency reduction techniques should be developed, which requires the knowledge of the relationship between latency and other important DSM parameters. In this paper, se...
متن کاملCluster Programming with Shared Memory on Disk
The advent of high-performance workstation clusters with matching highperformance network interconnects offers the opportunity to compute in new ways. Cluster implementations of our hydrodynamics code typically have left domain decomposed independent data contexts in each cluster member’s memory, sending only an updated “halo” of domain boundary information to neighboring nodes. On these new ne...
متن کاملA CC-NUMA Prototype Card for SCI-Based PC Clustering
It is extremely important to minimize network access time in constructing a high-performance PC cluster system. For an SCI-based PC cluster, it is possible to reduce the network access time by maintaining network cache in each cluster node. This paper presents a CCNUMA card that utilizes network cache for SCI-based PC clustering. The CC-NUMA card is directly plugged into the PCI slot of each no...
متن کامل